NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Uniform-in-time Wasserstein stability bounds for (noisy) stochastic gradient descent

Zhu, Lingjiong; Gurbuzbalaban, Mert; Raj, Anant; Simsekli, Umut (February 2024, Advances in Neural Information Processing Systems)
Uniform-in-time Wasserstein stability bounds for (noisy) stochastic gradient descent

Zhu, Lingjiong; Gurbuzbalaban, Mert; Raj, Anant; Simsekli, Umut (February 2024, Advances in Neural Information Processing Systems)
Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Zhu, Lingjiong; Gurbuzbalaban, Mert; Raj, Anant; Simsekli, Umut (December 2023, Advances in neural information processing systems)

Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires developing Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time uniform bounds – which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel.
more » « less
Full Text Available
Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

Raj, Anant; Zhu, Lingjiong; Gurbuzbalaban, Mert; Simsekli, Umut (October 2023, Proceedings of Machine Learning Research)

Heavy-tail phenomena in stochastic gradient de- scent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenom- ena theoretically, several works have made strong topological and statistical assumptions to link the generalization error to heavy tails. Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations. While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this paper, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well. Our approach is based on developing Wasserstein stability bounds for heavy- tailed SDEs and their discretizations, which we then convert to generalization bounds. Our results do not require any nontrivial assumptions; yet, they shed more light to the empirical observations, thanks to the generality of the loss functions.
more » « less
Full Text Available
Algorithmic stability of heavy-tailed SGD with general loss functions

Raj, Anant; Zhu, Lingjiong; Gurbuzbalaban, Mert; Simsekli, Umut (August 2023, International Conference on Machine Learning)
Algorithmic stability of heavy-tailed SGD with general loss functions

Raj, Anant; Zhu, Lingjiong; Gurbuzbalaban, Mert; Simsekli, Umut (August 2023, International Conference on Machine Learning)
Variational Principles for Mirror Descent and Mirror Langevin Dynamics

https://doi.org/10.1109/LCSYS.2023.3274069

Tzen, Belinda; Raj, Anant; Raginsky, Maxim; Bach, Francis (January 2023, IEEE Control Systems Letters)

Full Text Available
Algorithmic stability of heavy-tailed stochastic gradient descent on least squares

Raj, Anant; Barsbey, Melih; Gurbuzbalaban, Mert; Zhu, Lingjiong; Simsekli, Umut (January 2023, International Conference on Algorithmic Learning Theory)

Full Text Available

Search for: All records